Subspace MOA: Subspace Stream Clustering Evaluation Using the MOA Framework
نویسندگان
چکیده
Most available static data are becoming more and more highdimensional. Therefore, subspace clustering, which aims at finding clusters not only within the full dimension but also within subgroups of dimensions, has gained a significant importance. Recently, OpenSubspace framework was proposed to evaluate and explorate subspace clustering algorithms in WEKA with a rich body of most state of the art subspace clustering algorithms and measures. Parallel to it, MOA (Massive Online Analysis) framework was developed also above WEKA to provide algorithms and evaluation methods for mining tasks on evolving data streams over the full space only. Similar to static data, most streaming data sources are becoming highdimensional, and tracking their evolving clusters is also becoming important and challenging. In this demonstrator, we present, to the best of our knowledge, the first subspace clustering evaluation framework over data streams called Subspace MOA. Our demonstrator follows the onlineoffline model which is used in most data stream clustering algorithms. In the online phase, users have the possibility to select one of seven most famous summarization techniques to form the microclusters. In the offline phase, one of ten subspace clustering algorithms can be selected. The framework is supported with a subspace stream generator, a visualization interface to present the evolving clusters over different subspaces, and various subspace clustering evaluation measures.
منابع مشابه
Effective Evaluation Measures for Subspace Clustering of Data Streams
Nowadays, most streaming data sources are becoming highdimensional. Accordingly, subspace stream clustering, which aims at finding evolving clusters within subgroups of dimensions, has gained a significant importance. However, existing subspace clustering evaluation measures are mainly designed for static data, and cannot reflect the quality of the evolving nature of data streams. On the other ...
متن کاملBenchmarking Stream Clustering Algorithms within the MOA Framework
In today’s applications, massive, evolving data streams are ubiquitous. To gain useful information from this data, real time clustering analysis for streams is needed. A multitude of stream clustering algorithms were introduced. However, assessing the effectiveness of such an algorithm is challenging, because up to now there is no tool that allows a direct comparison of these algorithms. We pre...
متن کاملMOA: Massive Online Analysis, a Framework for Stream Classification and Clustering
In today’s applications, massive, evolving data streams are ubiquitous. Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problems of scaling up the implementation of state of the art algorithms to real world dataset sizes and of making algorithm...
متن کاملStream Data Mining: Platforms, Algorithms, Performance Evaluators and Research Trends
Streaming data are potentially infinite sequence of incoming data at very high speed and may evolve over the time. This causes several challenges in mining large scale high speed data streams in real time. Hence, this field has gained a lot of attention of researchers in previous years. This paper discusses various challenges associated with mining such data streams. Several available stream da...
متن کاملEvaluation of Monte Carlo Subspace Clustering with OpenSubspace
We present the results of a thorough evaluation of the subspace clustering algorithm SEPC using the OpenSubspace framework. We show that SEPC outperforms competing projected and subspace clustering algorithms on synthetic and some real world data sets. We also show that SEPC can be used to effectively discover clusters with overlapping objects (i.e., subspace clustering).
متن کامل